There Is No Preview Available For This Item

This item does not appear to have any files that can be experienced on Archive.org.
Please download files in this item to interact with them on your computer.
Show all files

Word2Vec Models for Twenty-year Periods of 18C (ECCO, "Literature and Language")

by: Ryan Heuser

Publication date: 2016-09-24

Usage: Attribution 4.0 International

Topics: word2vec, eighteenth century, literature, digital humanities

Collection: opensource_media

Language: English

Word2Vec Models for Twenty-year Periods of 18C (ECCO, "Literature and Language")

Creator: Ryan Heuser (heuser@stanford.edu | @quadrismegistus)

License: Creative Commons (CC BY-SA)

This is a zipped folder of five word2vec models. Each model was trained on 150 million words randomly sampled from the "Literature and Language" section of Eighteenth-Century Collections Online ("ECCO," by Gale). The five models correspond to the five twenty-year periods of the 18C: 1700-19, 1720-39, 1740-59, 1760-79, 1780-1799.

Each period has two files. One is a gzipped word2vec model, saved in Google's word2vec format, which is readable by word2vec, gensim, and GloVe. The other is a plain text file of the form [word][space][count][newline]. Both files can be easily imported with gensim like this:

import gensim

# Load word2vec model in the form: load_word2vec_format([gzipped model filename], [vocabulary filename])

model = gensim.models.Word2Vec.load_word2vec_format('word2vec.ECCO.1700-1719.skipgram_n=10.model.txt.gz', 'word2vec.ECCO.1700-1719.skipgram_n=10.model.vocab.txt')

# Test an analogy: Man is to woman as king is to ____?

print model.most_similar(['woman','king'], ['man'])

The models were trained using a skip-gram size of ten words. In order to reduce file size, each model was pruned of words not in the most frequent 50,000 words for that model / twenty-year period sample.

Addeddate: 2016-09-24 08:29:41

Identifier: word-vectors-18c-word2vec-models-across-20-year-periods

Identifier-ark: ark:/13960/t4qk2b01n

Scanner: Internet Archive HTML5 Uploader 1.6.3

Year: 2016

plus-circle Add Review

comment
Reviews

There are no reviews yet. Be the first one to write a review.

61 Views

DOWNLOAD OPTIONS

1 file

TORRENT

1 file

ZIP

5 Files
5 Original

SHOW ALL

IN COLLECTIONS

Community Data

Community Collections

Uploaded by quadrismegistus on September 24, 2016

Internet Archive Audio

Featured

Top

Images

Featured

Top

Software

Featured

Top

Books

Featured

Top

Video

Featured

Top

Mobile Apps

Browser Extensions

Archive-It Subscription

Save Page Now

Word2Vec Models for Twenty-year Periods of 18C (ECCO, "Literature and Language")

plus-circle Add Review

comment
Reviews

DOWNLOAD OPTIONS

IN COLLECTIONS

SIMILAR ITEMS (based on metadata)

Internet Archive Audio

Featured

Top

Images

Featured

Top

Software

Featured

Top

Books

Featured

Top

Video

Featured

Top

Mobile Apps

Browser Extensions

Archive-It Subscription

Save Page Now

Word2Vec Models for Twenty-year Periods of 18C (ECCO, "Literature and Language")

Item Preview

Flag this item for

Word2Vec Models for Twenty-year Periods of 18C (ECCO, "Literature and Language")

plus-circle Add Review comment Reviews

DOWNLOAD OPTIONS

IN COLLECTIONS

SIMILAR ITEMS (based on metadata)

plus-circle Add Review

comment
Reviews